Multi-terminal cross-point synaptic device using nanocrystal dot structures

ABSTRACT

Described herein is a crossbar array that includes a cross-point synaptic device at each of a plurality of crosspoints. The cross-point synaptic device includes a weight storage element comprising a set of nanocrystal dots. Further, the cross-point synaptic device includes at least three terminals for interacting with the weight storage element, wherein a weight is stored in the weight storage element by sending a first electric pulse via a gate terminal from the at least three terminals, the first electric pulse causes the nanocrystal dots to store a corresponding charge, and the weight is erased from the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse.

BACKGROUND

The present invention relates in general to configurations of trainable cross-point synaptic devices, which are referred to herein as in-memory computing unit cell, one specific example of in-memory computing units is resistive processing units (RPUs). More specifically, the present invention relates to deep neural networks (DNNs) formed from crossbar arrays that provide local data storage and local data processing without the need for additional processing elements, thereby accelerating the DNN's ability to implement algorithms such as matrix multiplication, matrix decomposition and the like. Particularly, the present invention describes in-memory unit cell using nanocrystal dot structures.

Technical problems such as character recognition and image recognition by a computer are known to be well handled by machine-learning techniques. “Machine learning” is used to broadly describe a primary function of electronic systems that learn from data. In machine learning and cognitive science, ANNs are a family of statistical learning models inspired by the biological neural networks of animals, and in particular the brain. ANNs can be used to estimate or approximate systems and functions that depend on a large number of inputs and are generally unknown. ANNs can include DNNs, convolutional neural networks (CNNs), and other types of neural networks. Crossbar arrays are high density, low cost circuit architectures used to form a variety of electronic circuits and devices, including ANN architectures, neuromorphic microchips and ultra-high density nonvolatile memory. A basic crossbar array configuration includes a set of conductive row wires and a set of conductive column wires formed to intersect the set of conductive row wires. The intersections between the two sets of wires are separated by so-called cross-point devices.

SUMMARY

According to one or more embodiments of the present invention, a crossbar array includes a cross-point synaptic device at each of a plurality of crosspoints. The cross-point synaptic device includes a weight storage element comprising a set of nanocrystal dots. Further, the cross-point synaptic device includes at least three terminals for interacting with the weight storage element, wherein a weight is stored in the weight storage element by sending a first electric pulse via a gate terminal from the at least three terminals, the first electric pulse causes the nanocrystal dots to store a corresponding charge, and the weight is erased from the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse.

According to one or more embodiments of the present invention, a system includes a controller, and a crossbar array coupled with the controller, the crossbar array to operate as a neural network. The crossbar array includes a cross-point synaptic device at each of a plurality of crosspoints. The cross-point synaptic device includes a weight storage element comprising a set of nanocrystal dots. Further, the cross-point synaptic device includes at least three terminals for interacting with the weight storage element, wherein a weight is stored in the weight storage element by sending a first electric pulse via a gate terminal from the at least three terminals, the first electric pulse causes the nanocrystal dots to store a corresponding charge, and the weight is erased from the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse.

According to one or more embodiments of the present invention, a method for controlling weight stored in a crossbar array used for implementing a neural network includes updating a weight stored by a weight storage element of a cross-point synaptic device at each of a plurality of crosspoints of the crossbar array by sending a first electric pulse via a gate terminal of the weight storage element, the first electric pulse causes a set of nanocrystal dots in the gate terminal to store a corresponding charge representing the weight. The method further includes erasing the weight stored in the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse.

Additional technical features and benefits are realized through the techniques of the present invention. Embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed subject matter. For a better understanding, refer to the detailed description and to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The examples described throughout the present document will be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale. Moreover, in the figures, like-referenced numerals designate corresponding parts throughout the different views.

FIG. 1 depicts a simplified diagram of input and output connections of a biological neuron;

FIG. 2 depicts a known simplified model of the biological neuron shown in FIG. 1;

FIG. 3 depicts a known simplified model of an ANN incorporating the biological neuron model shown in FIG. 2;

FIG. 4 depicts a simplified illustration of a typical read-process-write weight update operation;

FIG. 5 illustrates an artificial neural network (ANN) architecture;

FIG. 6 is a diagram of a two-dimensional (2D) crossbar array according to one or more embodiments of the present invention;

FIG. 7A and FIG. 7B depict examples of cross-point synaptic devices that use nanocrystal gate for weight storage according to one or more embodiments of the present invention;

FIG. 7C depicts an example cross-point synaptic device that has multiple layers of nanocrystal dots according to one or more embodiments of the present invention; and

FIG. 8 depicts a system for performing a matrix-matrix multiplication using a crossbar array according to one or more embodiments of the present invention.

The diagrams depicted herein are illustrative. There can be many variations to the diagram or the operations described therein without departing from the spirit of the invention. For instance, the actions can be performed in a differing order or actions can be added, deleted or modified. Also, the term “coupled” and variations thereof describes having a communications path between two elements and does not imply a direct connection between the elements with no intervening elements/connections between them. All of these variations are considered a part of the specification.

DETAILED DESCRIPTION

Described herein are technical solutions for cross-point synaptic devices, such as Resistive Process Unit (RPU) devices, based on 3-terminal nanocrystal dots structure. The technical solutions include using cross-point synaptic devices, such as those configured in a crosspoint (or crossbar) array for training neural networks. As such the technical solutions are rooted in and/or tied to computer technology in order to overcome a problem specifically arising in the realm of computers, specifically training neural networks, such as by using matrix operations like matrix-matrix multiplication. The technical solutions described herein facilitate speeding up deep neuron network processing and enable online training capability.

Various embodiments of the invention are described herein with reference to the related drawings. Alternative embodiments of the invention can be devised without departing from the scope of this invention. Various connections and positional relationships (e.g., over, below, adjacent, etc.) are set forth between elements in the following description and in the drawings. These connections and/or positional relationships, unless specified otherwise, can be direct or indirect, and the present invention is not intended to be limiting in this respect. Accordingly, a coupling of entities can refer to either a direct or an indirect coupling, and a positional relationship between entities can be a direct or indirect positional relationship. Moreover, the various tasks and process steps described herein can be incorporated into a more comprehensive procedure or process having additional steps or functionality not described in detail herein.

The following definitions and abbreviations are to be used for the interpretation of the claims and the specification. As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having,” “contains” or “containing,” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a composition, a mixture, process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but can include other elements not expressly listed or inherent to such composition, mixture, process, method, article, or apparatus.

Additionally, the term “exemplary” is used herein to mean “serving as an example, instance or illustration.” Any embodiment or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or designs. The terms “at least one” and “one or more” can be understood to include any integer number greater than or equal to one, i.e. one, two, three, four, etc. The terms “a plurality” can be understood to include any integer number greater than or equal to two, i.e. two, three, four, five, etc. The term “connection” can include both an indirect “connection” and a direct “connection.”

The terms “about,” “substantially,” “approximately,” and variations thereof, are intended to include the degree of error associated with measurement of the particular quantity based upon the equipment available at the time of filing the application. For example, “about” can include a range of ±8% or 5%, or 2% of a given value.

For the sake of brevity, conventional techniques related to making and using aspects of the invention may or may not be described in detail herein. In particular, various aspects of computing systems and specific computer programs to implement the various technical features described herein are well known. Accordingly, in the interest of brevity, many conventional implementation details are only mentioned briefly herein or are omitted entirely without providing the well-known system and/or process details.

It is understood in advance that although one or more embodiments are described in the context of biological neural networks with a specific emphasis on modeling brain structures and functions, implementation of the teachings recited herein are not limited to modeling a particular environment. Rather, embodiments of the present invention are capable of modeling any type of environment, including for example, weather patterns, arbitrary data collected from the internet, and the like, as long as the various inputs to the environment can be turned into a vector.

Artificial neural networks (ANNs) can be used to estimate or approximate systems and functions that depend on a large number of inputs and are generally unknown. Neural networks use a class of algorithms based on a concept of interconnected “neurons.” In a typical neural network, neurons have a given activation function that operates on the inputs. By determining proper connection weights (a process also referred to as “training”), a neural network achieves efficient recognition of a desired patterns, such as images and characters. Oftentimes, these neurons are grouped into “layers” in order to make connections between groups more obvious and to each computation of values. Training the neural network is a computationally intense process.

ANNs are often embodied as so-called “neuromorphic” systems of interconnected processor elements that act as simulated “neurons” and exchange “messages” between each other in the form of electronic signals. Similar to the so-called “plasticity” of synaptic neurotransmitter connections that carry messages between biological neurons, the connections in ANNs that carry electronic messages between simulated neurons are provided with numeric weights that correspond to the strength or weakness of a given connection. The weights can be adjusted and tuned based on experience, making ANNs adaptive to inputs and capable of learning. For example, an ANN for handwriting recognition is defined by a set of input neurons which can be activated by the pixels of an input image. After being weighted and transformed by a function determined by the network's designer, the activations of these input neurons are then passed to other downstream neurons, which are often referred to as “hidden” neurons. This process is repeated until an output neuron is activated. The activated output neuron determines which character was read.

Crossbar arrays, also known as crosspoint arrays or crosswire arrays, are high density, low cost circuit architectures used to form a variety of electronic circuits and devices, including ANN architectures, neuromorphic microchips and ultra-high density nonvolatile memory. A basic crossbar array configuration includes a set of conductive row wires and a set of conductive column wires formed to intersect the set of conductive row wires. The intersections between the two sets of wires are separated by so-called crosspoint devices, which can be formed from thin film material.

Crosspoint devices, in effect, function as the ANN's weighted connections between neurons. Nanoscale devices, for example memristors having “ideal” conduction state switching characteristics, are often used as the crosspoint devices in order to emulate synaptic plasticity with high energy efficiency. The conduction state (e.g., resistance) of the ideal memristor material can be altered by controlling the voltages applied between individual wires of the row and column wires. Digital data can be stored by alteration of the memristor material's conduction state at the intersection to achieve a high conduction state or a low conduction state. The memristor material can also be programmed to maintain two or more distinct conduction states by selectively setting the conduction state of the material. The conduction state of the memristor material can be read by applying a voltage across the material and measuring the current that passes through the target crosspoint device. So far, however, there have been some practical drawbacks in memristor and other resistive-random-access-memory (RRAM) based crosspoint devices that are detrimental to ANN applications; significant device-to-device variability and the asymmetry between “set (i.e. to increment resistance)” and “reset (i.e. to decrement resistance)” operations are two such main limitations.

In order to limit power consumption, the crosspoint devices of ANN chip architectures are often designed to utilize offline learning techniques, wherein the approximation of the target function does not change once the initial training phase has been resolved. Offline learning allows the crosspoint devices of crossbar-type ANN architectures to be simplified such that they draw very little power.

Notwithstanding the potential for lower power consumption, executing offline training can be difficult and resource intensive because it is typically necessary during training to modify a significant number of adjustable parameters (e.g., weights) in the ANN model to match the input-output pairs for the training data. Accordingly, simplifying the crosspoint devices of ANN architectures to prioritize power-saving, offline learning techniques typically means that training speed and training efficiency are not optimized.

Providing simple crosspoint devices that keep power consumption within an acceptable range, as well as accelerate the speed and efficiency of training ANN architectures, would improve overall ANN performance and allow a broader range of ANN applications. Accordingly, described herein are technical solutions that facilitate using and training crosspoint arrays that facilitate online training. For example, an ANN using cross-point synaptic cross-point synaptic devices can perform online learning in a parallel processing manner. However, for this, the cross-point synaptic devices have to facilitate back-propagation updating. This can demand for the cross-point synaptic device to show symmetric updating characteristic, which means a previous state data can be retrieved after an updating pass of the ANN learning is completed. This is a technical challenge, particularly for the material development and device optimization.

The technical solutions described herein address such technical challenges for the cross-point synaptic devices, which is different from typical spike neuron solutions. As described in detail further, the technical solutions described herein utilize floating nanocrystal dots (gates) in the cross-point synaptic devices to realize the multiple states during the ANN training. During the training, updating the weights of the cross-point synaptic devices depends on the pulse applied and charges stored in the discrete nanocrystal dots. When the updated weights are to be accessed, for example during forward and/or backward propagation, or even during the update, the weights can be read in a non-disturbing manner, for example through a 3-terminal approach.

In one or more examples, weight elements are stored in nanocrystal dots associated with each cross-point synaptic device. Here, “weight” refers to a computational value being used during computations of an ANN as described further.

Although embodiments of the present invention are directed to electronic systems, for ease of reference and explanation various aspects of the electronic systems are described using neurological terminology such as neurons, plasticity and synapses, for example. It will be understood that for any discussion or illustration herein of an electronic system, the use of neurological terminology or neurological shorthand notations are for ease of reference and are meant to cover the neuromorphic, ANN equivalent(s) of the described neurological function or neurological component.

Instead of utilizing the traditional digital model of manipulating zeros and ones, ANNs create connections between processing elements that are substantially the functional equivalent of the core system functionality that is being estimated or approximated. For example, IBM™'s SyNapse™ computer chip is the central component of an electronic neuromorphic machine that attempts to provide similar form, function and architecture to the mammalian brain. Although the IBM SyNapse computer chip uses the same basic transistor components as conventional computer chips, its transistors are configured to mimic the behavior of neurons and their synapse connections. The IBM SyNapse computer chip processes information using a network of just over one million simulated “neurons,” which communicate with one another using electrical spikes similar to the synaptic communications between biological neurons. The IBM SyNapse architecture includes a configuration of processors (i.e., simulated “neurons”) that read a memory (i.e., a simulated “synapse”) and perform simple operations. The communications between these processors, which are typically located in different cores, are performed by on-chip network routers.

A general description of how a typical ANN operates will now be provided with reference to FIGS. 1, 2 and 3. As previously noted herein, a typical ANN models the human brain, which includes about one hundred billion interconnected cells called neurons. FIG. 1 depicts a simplified diagram of a biological neuron 102 having pathways 104, 106, 108, 110 that connect it to upstream inputs 112, 114, downstream outputs 116 and downstream “other” neurons 118, configured and arranged as shown. Each biological neuron 102 sends and receives electrical impulses through pathways 104, 106, 108, 110. The nature of these electrical impulses and how they are processed in biological neuron 102 are primarily responsible for overall brain functionality. The pathway connections between biological neurons can be strong or weak. When a given neuron receives input impulses, the neuron processes the input according to the neuron's function and sends the result of the function to downstream outputs and/or downstream “other” neurons.

Biological neuron 102 is modeled in FIG. 2 as a node 202 having a mathematical function, f(x) depicted by the equation shown in FIG. 2. Node 202 takes electrical signals from inputs 212, 214, multiplies each input 212, 214 by the strength of its respective connection pathway 204, 206, takes a sum of the inputs, passes the sum through a function, f(x), and generates a result 216, which can be a final output or an input to another node, or both. In the present description, an asterisk (*) is used to represent a multiplication. Weak input signals are multiplied by a very small connection strength number, so the impact of a weak input signal on the function is very low. Similarly, strong input signals are multiplied by a higher connection strength number, so the impact of a strong input signal on the function is larger. The function f(x) is a design choice, and a variety of functions can be used. An example design choice for f(x) is the hyperbolic tangent function, which takes the function of the previous sum and outputs a number between minus one and plus one.

FIG. 3 depicts a simplified ANN model 300 organized as a weighted directional graph, wherein the artificial neurons are nodes (e.g., 302, 308, 316), and wherein weighted directed edges (e.g., m1 to m20) connect the nodes. It should be noted that the ANN model 300 depicted is a fully connected DNN, however, the technical solutions described herein are also applicable in case of other types of ANNs, such as CNN, partially connected DNN, and the like, and not just limited to a fully connected DNN. ANN model 300 is organized such that nodes 302, 304, 306 are input layer nodes, nodes 308, 310, 312, 314 are hidden layer nodes and nodes 316, 318 are output layer nodes. Each node is connected to every node in the adjacent layer by connection pathways, which are depicted in FIG. 3 as directional arrows having connection strengths m1 to m20. Although only one input layer, one hidden layer and one output layer are shown, in practice, multiple input layers, hidden layers and output layers can be provided.

Similar to the functionality of a human brain, each input layer node 302, 304, 306 of ANN 300 receives inputs x1, x2, x3 directly from a source (not shown) with no connection strength adjustments and no node summations. Accordingly, y1=f(x1), y2=f(x2) and y3=f(x3), as shown by the equations listed at the bottom of FIG. 3. Each hidden layer node 308, 310, 312, 314 receives its inputs from all input layer nodes 302, 304, 306 according to the connection strengths associated with the relevant connection pathways. Thus, in hidden layer node 308, y4=f(m1*y1+m5*y2+m9*y3), wherein * represents a multiplication. A similar connection strength multiplication and node summation is performed for hidden layer nodes 310, 312, 314 and output layer nodes 316, 318, as shown by the equations defining functions y5 to y9 depicted at the bottom of FIG. 3.

ANN model 300 processes data records one at a time, and it “learns” by comparing an initially arbitrary classification of the record with the known actual classification of the record. Using a training methodology knows as “backpropagation” (i.e., “backward propagation of errors”), the errors from the initial classification of the first record are fed back into the network and used to modify the network's weighted connections the second time around, and this feedback process continues for many iterations. In the training phase of an ANN, the correct classification for each record is known, and the output nodes can therefore be assigned “correct” values. For example, a node value of “1” (or 0.9) for the node corresponding to the correct class, and a node value of “0” (or 0.1) for the others. It is thus possible to compare the network's calculated values for the output nodes to these “correct” values, and to calculate an error term for each node (i.e., the “delta” rule). These error terms are then used to adjust the weights in the hidden layers so that in the next iteration the output values will be closer to the “correct” values.

There are many types of neural networks, but the two broadest categories are feed-forward and feedback/recurrent networks. ANN model 300 is a non-recurrent feed-forward network having inputs, outputs and hidden layers. The signals can only travel in one direction. Input data is passed onto a layer of processing elements that perform calculations. Each processing element makes its computation based upon a weighted sum of its inputs. The new calculated values then become the new input values that feed the next layer. This process continues until it has gone through all the layers and determined the output. A threshold transfer function is sometimes used to quantify the output of a neuron in the output layer.

A feedback/recurrent network includes feedback paths, which mean that the signals can travel in both directions using loops. All possible connections between nodes are allowed. Because loops are present in this type of network, under certain operations, it can become a non-linear dynamical system that changes continuously until it reaches a state of equilibrium. Feedback networks are often used in associative memories and optimization problems, wherein the network looks for the best arrangement of interconnected factors.

The speed and efficiency of machine learning in feed-forward and recurrent ANN architectures depend on how effectively the crosspoint devices of the ANN crossbar array perform the core operations of typical machine learning algorithms. Although a precise definition of machine learning is difficult to formulate, a learning process in the ANN context can be viewed as the problem of updating the crosspoint device connection weights so that a network can efficiently perform a specific task. The crosspoint devices typically learn the necessary connection weights from available training patterns. Performance is improved over time by iteratively updating the weights in the network. Instead of following a set of rules specified by human experts, ANNs “learn” underlying rules (like input-output relationships) from the given collection of representative examples. Accordingly, a learning algorithm can be generally defined as the procedure by which learning rules are used to update and/or adjust the relevant weights.

The three main learning algorithm paradigms are supervised, unsupervised and hybrid. In supervised learning, or learning with a “teacher,” the network is provided with a correct answer (output) for every input pattern. Weights are determined to allow the network to produce answers as close as possible to the known correct answers. Reinforcement learning is a variant of supervised learning in which the network is provided with only a critique on the correctness of network outputs, not the correct answers themselves. In contrast, unsupervised learning, or learning without a teacher, does not require a correct answer associated with each input pattern in the training data set. It explores the underlying structure in the data, or correlations between patterns in the data, and organizes patterns into categories from these correlations. Hybrid learning combines supervised and unsupervised learning. Parts of the weights are usually determined through supervised learning, while the others are obtained through unsupervised learning.

As previously noted herein, in order to limit power consumption, the crosspoint devices of ANN chip architectures are often designed to utilize offline learning techniques, wherein the approximation of the target function does not change once the initial training phase has been resolved. Offline learning allows the crosspoint devices of crossbar-type ANN architectures to be simplified such that they draw very little power.

Notwithstanding the potential for lower power consumption, executing offline training can be difficult and resource intensive because it is typically necessary during training to modify a significant number of adjustable parameters (e.g., weights) in the ANN model to match the input-output pairs for the training data. FIG. 4 depicts a simplified illustration of a typical read-process-write weight update operation, wherein CPU/GPU cores (i.e., simulated “neurons”) read a memory (i.e., a simulated “synapse”) and perform weight update processing operations, then write the updated weights back to memory. Accordingly, simplifying the crosspoint devices of ANN architectures to prioritize power-saving, offline learning techniques typically means that training speed and training efficiency are not optimized.

FIG. 5 illustrates an artificial neural network (ANN) architecture 500. During feed-forward operation, a set of input neurons 502 each provide an input voltage in parallel to a respective row of weights 504. A weight 504 is a crosspoint device, such as an RPU. The weights 504 each have a settable resistance value, such that a current output flows from the weight 504 to a respective hidden neuron 506 to represent the weighted input. The current output by a given weight is determined as 1=v/r, where V is the input voltage from the input neuron 502 and r is the set resistance of the weight 504. The current from each weight adds column-wise and flows to a hidden neuron 506. A set of reference weights 507 have a fixed resistance and combine their outputs into a reference current that is provided to each of the hidden neurons 506. Because conductance values can only be positive numbers, some reference conductance is needed to encode both positive and negative values in the matrix. The currents produced by the weights 504 are continuously valued and positive, and therefore the reference weights 507 are used to provide a reference current, above which currents are considered to have positive values and below which currents are considered to have negative values. By using floating nanocrystal dots (gates) to store the weights in a cross-point synaptic device, the multiple states of the training can be realized with symmetric updating. The weight updating depends on the pulse applied and charges stored in the discrete nanocrystal dots and non-disturb reading can be performed as well.

The hidden neurons 506 use the currents from the array of weights 504 and the reference weights 507 to perform some calculation. The hidden neurons 506 then output a voltage of their own to another array of weights 507. This array performs in the same way, with a column of weights 504 receiving a voltage from their respective hidden neuron 506 to produce a weighted current output that adds row-wise and is provided to the output neuron 508.

It should be understood that any number of these stages can be implemented, by interposing additional layers of arrays and hidden neurons 506. It should also be noted that some neurons can be constant neurons 509, which provide a constant voltage to the array. The constant neurons 509 can be present among the input neurons 502 and/or hidden neurons 506 and are only used during feed-forward operation.

During back propagation, the output neurons 508 provide a voltage back across the array of weights 504. The output layer compares the generated network response to training data and computes an error. The error is applied to the array as a voltage pulse, where the height and/or duration of the pulse is modulated proportional to the error value. In this example, a row of weights 504 receives a voltage from a respective output neuron 508 in parallel and converts that voltage into a current which adds column-wise to provide an input to hidden neurons 506. The hidden neurons 506 combine the weighted feedback signal with a derivative of its feed-forward calculation and stores an error value before outputting a feedback signal voltage to its respective column of weights 504. This back propagation travels through the entire network 500 until all hidden neurons 506 and the input neurons 502 have stored an error value.

During weight updates, the input neurons 502 and hidden neurons 506 apply a first weight update voltage forward and the output neurons 508 and hidden neurons 506 apply a second weight update voltage backward through the network 500. The combinations of these voltages create a state change within each weight 504, causing the weight 504 to take on a new resistance value. In this manner, the weights 504 can be trained to adapt the neural network 500 to errors in its processing. It should be noted that the three modes of operation, feed forward, back propagation, and weight update, do not overlap with one another.

As previously noted herein, to accommodate the learning speed requirement for deep neural network application, the embodiments of the present invention provide an analog weight update component. For example, a cross-point synaptic device in a cross-bar structure can enable parallel matrix multiplication and improve the neural network training speed.

One or more embodiments of the invention provide a programmable resistive crosspoint component referred to herein as a crosspoint device, or a resistive processing unit (RPU), which provides local data storage functionality and local data processing functionality. In other words, when performing data processing, the value stored at each RPU is updated in parallel and locally, which eliminate the need to move relevant data in and out of a processor and a separate storage element. Additionally, the local data storage and local data processing provided by the described RPUs accelerate the ANN'S ability to implement algorithms such as matrix inversion, matrix decomposition and the like. Accordingly, implementing a machine learning ANN architecture having the described RPU enables the implementation that optimize the speed, efficiency and power consumption of the ANN. The described RPU and resulting ANN architecture improve overall ANN performance and enable a broader range of practical ANN applications.

The RPU in the cross-bar structure enables parallel matrix multiplication and substantially enhances the neural network training speed. For online training using parallel processing, each weight element has to show symmetric updating characteristic, which means the previous state data has to be retrievable after updating is done, as is known in the art. For example, FETs, whose resistance can be well controlled by gate potential, is one candidate as a weight element for the neural network. Such technical challenges are addressed by the one or more embodiments of the invention described herein by facilitating weight storage elements that use nanocrystal dots (gates).

In one or more examples, charges are pumped into the nanocrystal dots for updating, and the charges are pumped out of the nanocrystal for retrieving the previous states. Opposite polarity of pulses (gate voltages) can be applied for pumping the charges in and out. Further, a predetermined amount of Vg is applied to achieve direct tunneling with less power consumption. Further, to read the weights, the gate voltage (pulse) can be very small or substantially 0, and the resulting current or conductance in the channel is measured as the stored weight. The reading can be performed similar to a 3 terminal FET, in one or more examples.

FIG. 6 depicts a cross bar array of cross-point synaptic devices according to embodiments of the present invention, along with voltage sequences illustrating the operation of the RPU. FIG. 6 is a diagram of a two-dimensional (2D) crossbar array 800 that performs forward matrix multiplication, backward matrix multiplication and weight updates according to embodiments of the present invention. Crossbar array 800 is formed from a set of conductive row wires 802, 804, 806 and a set of conductive column wires 808, 810, 812, and 814 that intersect the set of conductive row wires 802, 804, and 806. The intersections between the set of row wires and the set of column wires are separated by RPUs, which are shown in FIG. 6 as resistive elements each having its own adjustable/updateable resistive weight, depicted as σ11, σ21, σ31, σ41, σ12, σ22, σ32, σ42, σ13, σ23, σ33 and σ43, respectively. For ease of illustration, only one RPU 820 is labeled with a reference number in FIG. 6. In forward matrix multiplication, the conduction state (i.e., the stored weights) of the RPU can be read by applying a voltage across the RPU and measuring the current that passes through the RPU.

Input voltages V1, V2, V3 are applied to row wires 802, 804, 806, respectively. Each column wire 808, 810, 812, 814 sums the currents I1, I2, I3, I4 generated by each RPU along the particular column wire. For example, as shown in FIG. 6, the current I4 generated by column wire 814 is according to the equation I4=V1/σ41+V2/σ42+V3/σ43. Thus, array 800 computes the forward matrix multiplication by multiplying the values stored in the RPUs by the row wire inputs, which are defined by voltages V1, V2, V3. The backward matrix multiplication is very similar. In backward matrix multiplication, voltages are applied at column wires 808, 810, 812, 814 then read from row wires 802, 804, 806. For weight updates, which are described in greater detail below, voltages are applied to column wires and row wires at the same time, and the conductance values stored in the relevant cross-point synaptic devices all update in parallel. Accordingly, the multiplication and addition operations required to perform weight updates are performed locally at each RPU 820 of array 800 using the cross-point synaptic device itself plus the relevant row or column wire of array 800. Thus, in accordance with the one or more embodiments of the present invention, no read-update-write cycles (shown in FIG. 4) are required in array 800.

Continuing with the diagram of FIG. 6, in accordance with one or more embodiments, the operation of a positive weight update methodology for RPU 820 and its corresponding weight σ33 at the intersection of conductive row wire 806 and conductive column wire 812 will now be provided. Update generator circuitry (not shown) is provided at the periphery of crossbar array 800 and used as a peripheral “translator” in order to generate necessary voltage pulses in the form of stochastic bit streams that are applied to all RPUs of 2D crossbar array 800.

Accordingly, referring to the ANN implemented using a crosspoint array including RPUs as described herein, in the array, the value of the resistance (or conductance) of each node determines the coupling between nodes, where a node is represented by an cross-point synaptic device in the array. Further, upon training the crosspoint array 800 according to the ANN, the resistance (or conductance) will be different from device to device, depending on the desired coupling. For training a neural network, it is necessary to actively adjust the resistance values. Once the training is complete, the resistance values remain fixed during operation of the crosspoint array circuit, until training begins for a new task.

FIG. 7A depicts an example cross-point synaptic device 820 that uses nanocrystal gate for weight storage according to one or more embodiments of the present invention. The cross-point synaptic device 820 includes a gate-stack 710 that includes a channel 740, and an insulator 755. The gate-stack further includes 4 terminals—a first electrode 715A, a source 720, a drain 730, and a second electrode 715B. In other examples, the cross-point synaptic device 820 can include additional terminals, for example, dual gates, tri-gates, etc. The terminals of the cross-point synaptic device 820 can provide option for weight—symmetry updating modulation or tuning. The channel 740 can be made of Si, SiGe, SiC, CNT, or III-V, or any semiconductor material.

The gate-stack 710 further includes a layer of nanocrystal dots 750, which can be Si, SiGe, Ge, metal, or any other such material. The sizes of the nanocrystal dots 750 are adjustable. In one or more examples, the nanocrystal dots 750 have substantially uniform size. Alternatively, the size of the nanocrystal dots 750 increases (or decreases) from the drain 730 to the source 720. Further, the shape of the nanocrystals 750 can be varied. For example, the nanocrystal dots 750 are square, round, half moon, or of any other shape. Further, the insulator material 725 between the nanocrystal dots 750 can be conventional dielectric materials, such as SiO2, SiON, Si3N4, a high-k dielectric, a ferroelectric material, or an RRAM material, or any other dielectric. In one or more examples, the nanocrystal dots 750 are grown embedded into the dielectric material of layer 725. The size of the nanocrystal dots 750 is controlled by temperature.

The write/weight update in the cross-point synaptic device 820, in this case, is performed by applying a voltage between the first electrode 715A and the second electrode 715B, which results in pumping charges into the nanocrystal dots 750. For example each nanocrystal dot 750 can represent one state or multi-states or weight value depending on size of the nanocrystals dots 750 and charges are stored or released by different polarity of pulses with various duration and amplitude. Because the size of the nanocrystal dots 750 can be varied, smaller sized nanocrystal dots 750 may store less charges than bigger sized, the weight value and the number of states represented by the nanocrystal dots 750 can also be varied. Based on the charges that are pumped into the nanocrystal dots 750, the corresponding subset of nanocrystal dots stores the charge, thus updating the weight. The weight can be erased by pumping the similar amount of charges out of the nanocrystal dots 750, the pumping out including providing an opposite polarity of pulses to the column voltage 832. During weight update, a small amount of Vg is applied to achieve direct tunneling with less power consumption.

For reading the weight value stored in the cross-point synaptic device 820, the weight value is sensed based on the current flow through the channel 740, between source 720 and drain 730. The current flow is modulated by the charges stored inside of the nanocrystal dots 750. For example, a predetermined gate voltage (pulse), which can be very small or substantially 0 is applied to the cross-point synaptic device 820, and current or conductance in the channel 740 is sensed. The sensed current provides the weight stored at the cross-point synaptic device 820. When reading, the second electrode 715B is electrically floating, i.e. the second electrode 715B is disconnected so that the potential that is applied for the reading operation does not screen any electric field generated by the charged nanocrystals 750.

FIG. 7B depicts another example cross-point synaptic device 820 that uses nanocrystal gate for weight storage according to one or more embodiments of the present invention. In this case, the crosspoint synaptic device 820 is a 3-terminal device that includes all of the components as described earlier, except that the gate-stack 710 only includes the first electrode 715A, the source 720, and the drain 730 (without the second electrode 715B). The gate-stack 710 further includes the layer of nanocrystal dots 750 as described herein.

For storing a weight in the cross-point synaptic device 820, charges are pumped into or out of the nanocrystal dots 750 via the first electrode 715A by applying voltage between the first electrode 715 and the source 720 and/or drain 730 (i.e. the channel 740). The nanocrystal dots 750 facilitate achieving substantially an analog state due to amount of charges stored by the nanocrystal dots 750. The amount of charges stored can be varied based on the row voltage 830 and the column voltage 832 applied to the cross-point synaptic device 820.

For example each nanocrystal dot 750 can represent one state or multi-states or weight value depending on size of the 750 and charged stored or released by different polarity of pulses with various duration and amplitude. Because the size of the nanocrystal dots 750 can be varied, smaller sized nanocrystal dots 750 may store less charges than bigger sized, the weight value and the number of states represented by the nanocrystal dots 750 can also be varied. Based on the charges that are pumped into the nanocrystal dots 750, the corresponding subset of nanocrystal dots stores the charge, thus updating the weight. The weight can be erased by pumping the similar amount of charges out of the nanocrystal dots 750, the pumping out including providing an opposite polarity of pulses to the column voltage 832. During weight update, a small amount of Vg is applied to achieve direct tunneling with less power consumption.

For reading the weight value stored in the cross-point synaptic device 820, the weight value is sensed based on the current flow through the channel 740, between source 720 and drain 730. The current flow is modulated by the charges stored inside of the nanocrystal dots 750. For example, a predetermined gate voltage (pulse), which can be very small or substantially 0 is applied to the cross-point synaptic device 820, and current or conductance in the channel 740 is sensed. The sensed current provides the weight stored at the cross-point synaptic device 820.

FIG. 7C depicts another example cross-point synaptic device 820 that has multiple layers of nanocrystal dots 750 according to one or more embodiments of the present invention. The nanocrystal dots 750 are grown in layers stacked upon each other in the dielectric layer 725.

A row voltage sequence or bit stream 830, which is applied to row wire 806, is shown as a sequence of voltage pulses representing weight updates having a voltage of zero or a voltage of +0.5 VSET. A column voltage sequence or bit stream 832, which is applied to column wire 814, is shown as a sequence of voltage pulses also representing weight updates having either a voltage of zero or a voltage of −0.5 VSET. The example voltage sequences 830, 832 represent a positive resistance change phase of the weight update. After the positive weight updates are performed, a separate set of sequences with the polarity of the respective voltages reversed can be used to update weights in a negative direction for those weights that need such correction.

Alternatively, or in addition, in one or more examples, a voltage sequence that is applied to the cross-point synaptic device 820 is the result from the difference between row voltage sequence 830 and column voltage sequence 832.

Further, in yet another embodiment, the cross-point synaptic device 820 is a 4-terminal device similar to that in FIG. 7A, with the gate-stack including the first electrode 715A, the source 720, the drain 730, and the second electrode 715B, and the nanocrystal dots 750 being grown in the stacked layers as shown in FIG. 7C. The read and write operations are performed in the same manner as described earlier (FIG. 7A), and is the not repeated here.

In each of the embodiments described above (FIG. 7A, 7B, 7C), the cross-point synaptic device includes a substrate (not shown) that can be a bulk, SOI, planar, FIN, or any other type of substrate layer.

FIG. 8 depicts a system 1300 in which the crossbar array 800 is controlled using a controller 1310 for performing the matrix-matrix multiplication among other operations according to one or more embodiments of the present invention. For example, the controller 1310 sends the input matrices to be multiplied by the crossbar array 800. In one or more examples, the controller 1310 stores the values in the crossbar array 800 and sends the input vectors 910. In one or more examples, the controller 1310 and the crossbar array 800 are coupled in a wired or a wireless manner, or a combination thereof. The controller 1310 further sends and instruction/command to the crossbar array 800 to initiate the matrix-matrix multiplication. The controller 1310 further can read the output vectors 915 from the crossbar array 800 after receiving a notification that the matrix-matrix multiplication has been performed. The controller 1310 can be a processing unit, or a computing system, such as a server, a desktop computer, a tablet computer, a phone, and the like. The controller 1310 can include a memory device that has computer executable instructions stored therein, the instructions when executed by the controller cause the matrix-matrix computation.

The technical solutions described herein improve neuromorphic development process by facilitating weight storage element devices with analog as well as memory properties. The technical solutions described herein speed up the deep neuron network processing and enable online training capability by facilitating cross-point synaptic devices with back-propagation updating to be used for deep neuron network implementation. The technical solutions accordingly improve computational technology, and particularly RPU arrays and ANNs implemented using such RPU arrays.

The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.

Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user' s computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instruction by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments described herein. 

What is claimed is:
 1. A crossbar array comprising: a cross-point synaptic device at each of a plurality of crosspoints, the cross-point synaptic device comprising: a weight storage element comprising a set of nanocrystal dots; and at least three terminals for interacting with the weight storage element, wherein a weight is stored in the weight storage element by sending a first electric pulse via a gate terminal from the at least three terminals, the first electric pulse causes the nanocrystal dots to store a corresponding charge, and the weight is erased from the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse, wherein the nanocrystal dots have a uniformly varying size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 2. The crossbar array of claim 1, wherein the nanocrystal dots have a decreasing size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 3. The crossbar array of claim 1, wherein the nanocrystal dots have an increasing size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 4. The crossbar array of claim 1, wherein the nanocrystal dots have one of a round shape, a half-moon shape, and a square shape.
 5. The crossbar array of claim 1, wherein the nanocrystal dots are in a single layer in the gate terminal.
 6. The crossbar array of claim 1, wherein the nanocrystal dots are stacked in multiple layers in the gate terminal.
 7. A system comprising: a controller; and a crossbar array coupled with the controller, the crossbar array configured to operate as a neural network, the crossbar array comprising: a cross-point synaptic device at each of a plurality of crosspoints, the cross-point synaptic device comprising: a weight storage element comprising a set of nanocrystal dots; and at least three terminals for interacting with the weight storage element, wherein a weight is stored in the weight storage element by sending a first electric pulse via a gate terminal from the at least three terminals, the first electric pulse causes the nanocrystal dots to store a corresponding charge, and the weight is erased from the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse, wherein the nanocrystal dots have a uniformly varying size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 8. The system of claim 7, wherein the nanocrystal dots have an increasing size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 9. The system of claim 7, wherein the nanocrystal dots have a decreasing size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 10. The system of claim 7, wherein the nanocrystal dots have one of a round shape, a half-moon shape, and a square shape.
 11. The system of claim 7, wherein the nanocrystal dots are in a single layer in the gate terminal.
 12. The system of claim 7, wherein the nanocrystal dots are stacked in multiple layers in the gate terminal.
 13. A method for controlling weight stored in a crossbar array used for implementing a neural network, the method comprising: updating a weight stored by a weight storage element of a cross-point synaptic device at each of a plurality of crosspoints of the crossbar array by sending a first electric pulse via a gate terminal of the weight storage element, the first electric pulse causes a set of nanocrystal dots in the gate terminal to store a corresponding charge representing the weight, wherein the nanocrystal dots have a uniformly varying size from a drain terminal to a source terminal from the at least three terminals, the size controlled by temperature when growing the nanocrystal dots in a dielectric material; and erasing the weight stored in the weight storage element by sending a second electric pulse via the gate terminal, the second electric pulse having an opposite polarity of the first electric pulse.
 14. The method of claim 13, wherein the nanocrystal dots have an increasing size from a first terminal to a second terminal of the weight storage element, the size controlled by temperature when growing the nanocrystal dots in a dielectric material.
 15. The method of claim 13, wherein the nanocrystal dots have one of a round shape, a half-moon shape, and a square shape.
 16. The method of claim 13, wherein the nanocrystal dots are in a single layer in the gate terminal.
 17. The method of claim 13, wherein the nanocrystal dots are stacked in multiple layers in the gate terminal. 